<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# Inference

Optimum Intel can be used to load models from the [Hub](https://huggingface.co/models) and create pipelines to run inference with IPEX optimizations (including patching with custom operators, weight prepacking and graph mode) on a variety of Intel processors. For now support is only enabled for CPUs.


## Loading

You can load your model and apply IPEX optimizations (apply torch.compile except text-generation tasks). For supported architectures like LLaMA, BERT and ViT, further optimizations will be applied by patching the model to use custom operators.
For now, support is enabled for Intel CPU/GPU. Previous models converted to TorchScript will be deprecated in v1.22.

```diff
  import torch
  from transformers import AutoTokenizer, pipeline
- from transformers import AutoModelForCausalLM
+ from optimum.intel import IPEXModelForCausalLM

  model_id = "gpt2"
- model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16)
+ model = IPEXModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16)
  tokenizer = AutoTokenizer.from_pretrained(model_id)
  pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
  results = pipe("He's a dreadful magician and")
```

As shown in the table below, each task is associated with a class enabling to automatically load your model.

| Auto Class                           | Task                                 |
|--------------------------------------|--------------------------------------|
| `IPEXModelForSequenceClassification` | `text-classification`                |
| `IPEXModelForTokenClassification`    | `token-classification`               |
| `IPEXModelForQuestionAnswering`      | `question-answering`                 |
| `IPEXModelForImageClassification`    | `image-classification`               |
| `IPEXModel`                          | `feature-extraction`                 |
| `IPEXModelForMaskedLM`               | `fill-mask`                          |
| `IPEXModelForAudioClassification`    | `audio-classification`               |
| `IPEXModelForCausalLM`               | `text-generation`                    |
| `IPEXModelForSeq2SeqLM`              | `text2text-generation`               |
